On the impact of alignment on voice conversion performance

نویسندگان

  • Elina Helander
  • Jan Schwarz
  • Jani Nurminen
  • Hanna Silén
  • Moncef Gabbouj
چکیده

Most of the current voice conversion systems model the joint density of source and target features using a Gaussian mixture model. An inherent property of this approach is that the source and target features have to be properly aligned for the training. It is intuitively clear that the accuracy of the alignment has some effect on the conversion quality but this issue has not been thoroughly studied in the literature. Examples of alignment techniques include the usage of a speech recognizer with forced alignment or dynamic time warping (DTW). In this paper, we study the effect of alignment on voice conversion quality through extensive experiments and discuss issues that should be considered. The main outcome of the study is that alignment clearly matters but with simple voice activity detection, DTW and some constraints we can achieve the same quality as with hand-marked labels.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

طراحی یک روش آموزش ناموازی جدید برای تبدیل گفتار با عملکردی بهتر از آموزش موازی

Introduction: The art of voice mimicking by computers, has with the computer have been one of the most challenging topics of speech processing in recent years. The system of voice conversion has two sides. In one side, the speaker is the source that his or her voice has been changed for mimicking the target speaker’s voice (which is on the other side). Two methods of p...

متن کامل

Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...

متن کامل

Frame alignment method for cross-lingual voice conversion

Most of the existing voice conversion methods calculate the optimal transformation function from a given set of paired acoustic vectors of the source and target speakers. The alignment of the phonetically equivalent source and target frames is problematic when the training corpus available is not parallel, although this is the most realistic situation. The alignment task is even more difficult ...

متن کامل

Voice conversion using precise speech alignment based on spectral property and eigen-codeword distribution

While voice conversion methods have been popularly applied to convert the speech signals uttered by a source speaker to a target speaker, frame-based voice conversion generally suffers from incorrect alignment using only spectral distance and therefore generate improper conversion results. In a parallel phone sequence, the alignment using minimum spectral distance between frame-based feature ve...

متن کامل

The Impact of Organizational Culture on Organizational Silence and Voice of Faculty Members of Islamic Azad University in Tehran

Organizational voice and silence are two intertwined strategies in organizations. Organizational voice is an appropriate communicational tool in order to improve the organization. On the other hand, silence can have destructive effects on the decision-making process and organizational changes. Organizational culture is the important factor which facilitates ways of expressing ideas by employees...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008